Data Set Balancing

نویسنده

  • David L. Olson
چکیده

This paper conducts experiments with three skewed data sets, seeking to demonstrate problems when skewed data is used, and identifying counter problems when data is balanced. The basic data mining algorithms of decision tree, regression-based, and neural network models are considered, using both categorical and continuous data. Two of the data sets have binary outcomes, while the third has a set of four possible outcomes. Key findings are that when the data is highly unbalanced, algorithms tend to degenerate by assigning all cases to the most common out come. When data is balanced, accuracy rates tend to decline. If data is balanced, that reduces the training set size, and can lead to the degeneracy of model failure through omission of cases encountered in the test set. Decision tree algorithms were found to be the most robust with respect to the degree of balancing applied.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-objective scheduling and assembly line balancing with resource constraint and cost uncertainty: A “box” set robust optimization

Assembly lines are flow-oriented production systems that are of great importance in the industrial production of standard, high-volume products and even more recently, they have become commonplace in producing low-volume custom products. The main goal of designers of these lines is to increase the efficiency of the system and therefore, the assembly line balancing to achieve an optimal system i...

متن کامل

Assembly line balancing to minimize balancing loss and system loss

Assembly Line production is one of the widely used basic principles in production system. The problem of Assembly Line Balancing deals with the distribution of activities among the workstations so that there will be maximum utilization of human resources and facilities without disturbing the work sequence. Research works reported in the literature mainly deals with minimization of idle time i.e...

متن کامل

A New Balancing and Ranking Method based on Hesitant Fuzzy Sets for Solving Decision-making Problems under Uncertainty

The purpose of this paper is to extend a new balancing and ranking method to handle uncertainty for a multiple attribute analysis under a hesitant fuzzy environment. The presented hesitant fuzzy balancing and ranking (HF-BR) method does not require attributes’ weights through the process of multiple attribute decision making (MADM) under hesitant conditions. For the rating of possible alternati...

متن کامل

Mixed-Model Assembly Line Balancing with Considering Reliability

This paper presents a multi-objective simulated annealing algorithm for the mixed-model assembly line balancing with stochastic processing times. Since, the stochastic task times may have effects on the bottlenecks of a system, maximizing the weighted line efficiency (equivalent to the minimizing the number of station), minimizing the weighted smoothness index and maximizing the system reliabil...

متن کامل

A Multi-Objective Particle Swarm Optimization for Mixed-Model Assembly Line Balancing with Different Skilled Workers

This paper presents a multi-objective Particle Swarm Optimization (PSO) algorithm for worker assignment and mixed-model assembly line balancing problem when task times depend on the worker’s skill level. The objectives of this model are minimization of the number of stations (equivalent to the maximization of the weighted line efficiency), minimization of the weighted smoothness index and minim...

متن کامل

A Hybrid Unconscious Search Algorithm for Mixed-model Assembly Line Balancing Problem with SDST, Parallel Workstation and Learning Effect

Due to the variety of products, simultaneous production of different models has an important role in production systems. Moreover, considering the realistic constraints in designing production lines attracted a lot of attentions in recent researches. Since the assembly line balancing problem is NP-hard, efficient methods are needed to solve this kind of problems. In this study, a new hybrid met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004